Massively Parallel Sequence Analysis with Hidden Markov Models

نویسنده

  • BERTIL SCHMIDT
چکیده

Molecular biologists use Hidden Markov Models (HMMs) as a popular tool to statistically describe protein sequence families. This statistical description can then be used for sensitive and selective database scanning. Even though efficient dynamic programming algorithms exist for the problem, the required scanning time is still very high, and because of the exponential database growth finding fast solutions is of high importance to research in this area. In this paper we illustrate how massive parallelism can be used for efficient sequence analysis using HMMs. We present two new techniques to parallelize the dynamic programming calculation: “diagonal-by-diagonal” and “row-by-row”. This leads to significant runtime savings on our hybrid parallel system based on commodity components to gain high performance at low cost. The architecture is built around a coarse-grained PC-cluster linked by a high-speed network and fine-grained SIMD processor arrays connected to each node.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Massively Parallel Biosequence Analysis

Massive parallelism is required for the analysis of the rapidly growing biosequence databases. First, this paper compares and benchmarks methods for dynamic programming sequence analysis on several parallel platforms. Next, a new hidden Markov model method and its implementation on several parallel machines is discussed. Finally, the results of a series of experiments using this massively paral...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Parallel Characteristic Extraction from Protein Sequence Database

An adaptive massively parallel system for flexible information processing has been investigated. This research requires a feedback from the real application. In this paper, a parallel characteristic extraction from the protein sequence database is described. Since the protein sequence database is huge and sequences have variety, an adaptive massively parallel system is mandatory. An HMM (hidden...

متن کامل

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

  Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...

متن کامل

­­Image Segmentation using Gaussian Mixture Model

Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002